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0 Method of improving signai-to-noise in electroplierograin. 

@ A method and apparatus for Improving signal to noise in an electropherogram acquired by electrophoresis. 
The migration time is remapped by binning the data with respect to migration time to improve signal to noise 
and peak resolution. The data points of the electropherogram are pooled into variable size bins, each 
con-esponding to a number of time intervals. The sizes of the bins increase with migration time. To further 
improve signal to noise and peak resolution, the binned data is filtered by Fourier transformation. The present 
invention allows for accurate determination of DNA sequences. 
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BACKGROUND OF THE INVENTIO N 
1. Reld of the Invention 



The present invention relates generally to the field of electrophoresis and more particularly to a method 
for improving signal-to-noise in the electropherogram acquired by capillary electrophoresis. 

2. Description of Related Art 

electrophoresis is a high resolution, high sensitivity method for the separation and detection of 
rnol^ular sp^ies such as peptides, proteins, and oligonucleotides (analytes). Separation is carried out in 
tiie capillary by causirjg the analytes to migrate at different rates in a separation medium (gel or electrolyte 
soluton) under the influence of an electric field. Analytes of the same species are resolved into respeSve 
bands as separaton takes place. A detector detects the presence of the bands. Detection can be by one of 
several schemes including laser-induced fluorescence (see U.S. Patent No. 4.675.300 to Zare). absorbance 
and radioactivity detection. ' «>ui,uruance 

Typically, the results of detection is represented by a plot of detected intensity versus time a so^lled 

t^t^lS'Z.'^ ^ "^^^ " P^«^"^« « the electroph;rogram. By 

analyzmg ^ elec^opherogram. one may be able to identify the presence of a particular species. Also one 
may be able to identify the sample that underwent electrophoresis by looking at the distribution of the 
^^'t ^Jr'^?*" ''^ *^ electropherogram (for example in identifying a DNA sample by analyzing 
data obtained from electrophoresis of sequenced DNA fragments) y analyzing 

r.JZl h " ^"f f °P^^^°9^^ ^"^^^ 4400 and 4800 seconds of migration using an arbitrary zero 

S^rr. t t i^'^rnnn ! ^"'"9 electrophoresis. Data was collected at every 0.1 second. 

?iJnJ i^H /'f^ "^T « ^ ^" *at for the entire duration of daia acquisition, 

a large set o dje points are obtained. Accordingly, the data storage requirement is substantial Also a^ 
can be seen in Fig. 1. the use of 0.1 second digitization results in a plot with rough lines and poor peS 
resolution. Better resolution can be obtained but is impractical as substantially more data stor^ s^ 
would be required. It is also difficult to detemiine whether some of the piks actually repSint^e 
1 "^^1^. """"^""^ °^ *^ signal-tonoise ratio is poor for 

S is (J^ri^ ""'"'"''^ *° '""""^^ signal-to-noise as well as resolution of the detection 

SUMMARY OF THE INVENTION 

nni.r» Present invention is directed to a method for remapping the migration time to improve signal-to- 
rj^nnh *° identification of peaks in an electropherogram. The data points of the 

electi-opherogram^e pooled into variable size bins, each corresponding to a number of time intervals 
having upper and lower limits that togetiier span a predetermined range of time. The size of the bins 
increases with migration time. 

a Jrdin^t °f P^^^«"* '"mention, the number of time intervals per bin is determined ' 

^ T Z "^^"^ ^«'°"9'"9 to « ^"^"^^ to represent 

2f TLl I T ^""""'"^ electropherogram is represented with respect to a squa"e 

root scale, where the average migration time of each window is approximately proportional to the square of 
the con-esponding bin number. m , 

In anotiier aspect of the present invention, the binned data can be Fourier transformed to data that can 
be easily interpreted to identify peaks. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Rg. 1 is an actual electropherogram plotted with raw data obtained from electrophoresis of a sample 
incor^raLi' ^ ""^"^ "^'^'^ °^ ^ electrophoresis system in which the present invention can' be 
inveSin^ ^ schematic representation of the data bins in accordance with one embodiment of the present 

Rg. 4 is an electropherogram showing data replotted with binned data. 
Fig. 5 is a plot of a Fourier transformed electropherogram. 
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DESCRIPTION OF ILLUSTRATED EMBODIMENTS 
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20, a capillary 22. a <laect» 24 and a3,^S j!?^*^- f ' 16 and 18. a higli Mage sonre. 

da« memoo/ 28 To elS^S^ T '■°^'^>"^- ^ ' °»"Pu«ns onit 26 having 

introduced irio on. .n^nr'riT' "r.T ""rL T^.. ^ » *»> «l«*°lyte and a sample is 

complete a electrical cirJ^ln Sc^n me eISS fs ^r^"^!^ ^'^^'^^ *° 

voltage source 20. Under applied voltaae thfT^^mni! electrolyte 15. electrodes 16 and 18 and the high 

are resolved into bands aoTs ^parS 4 es ITr^ T^Z^'T. '""'^"'^ "^'^^ "^^ ^P^^^ 

detector 24 in sequenti DeSSr cL t ^ ^ ""^'^'^ ^'"^ past the 

absorbance or radlSvT^ dSn Z^snT^I^'' JV""^'': '^"^-'"'"^ 

the literatures. Data is talTen at re^ulaMnTerv^s (eTo i^^^^^^^ ^"^"'"^"'^ 

data points. The data is sorted bv tSrcrZ^finn^n^; S« ^' "'"^^"«"«y obtaining data at discrete 

bin size increases with X^^^ b^ Ze nl^lT T °' ""^ration time intervals, "me 

for each data to a plurairtyXedeTe J^^^^ "^'"P^^^ 

that together span a pred^eS mtralTlf ' ' °* ^^^^ °^ ^'"^ 

computing unit determines wrh Wn thTdlS wShi^ '° 

total value stored in the bin The memorv S n!h h T ^"^'^'"9'^ *e value of the data to the 

number of bins estab,ishXle3pulg una 2^" ' ^'"""^ °' """^^ corresponding to the 

repre1e:t:j:V;nar^^^^^^^^ ^ -it 26 tor visual 

cathode ray tube or a printe (not shownrm^ ^ectropherogram. A d.splay unit such as a conventional 

*°r-cir ^ a~ ^rtoi^^^^ ^ ~ - 

ditfe:rr "^ch^lr^^r^^^^^^ '--.s tor each bin are 

rela^onship in Which th^nleaTmS a^^^^^^^^ ^^^.-^'"^ ^° ^ ^""^ 

number of that bin. It has been found tt.a^^.rh » J . ^ Proportonal to the square root of the 
the number of unit timeTmerisrttirfes^^^^^^^^ " ""^T" °' 

in number of unit «me interv.s for^icretirs ^ [^er™'^ ^'^^ — 



The mathematical analysis is shown below 
Let 

N = total number of bins 

n = 1 . 2, 3, — , N. the consecutive numbers of the bins 
'n - number of time interval units in the nth bin 
^ 51 = difference in number of time interval units between consecutive bins (a constant) 
Jr. - upper limit of migration time of nth bin constant) 
Tn - mean migration time of the nth bin 
5t = the width of each time interval unit (a constant) 
Fig 3 depicts a schematic illustration of the bins of migration time 

given bj " '^^^"^ '^""^"^^ "^^ "^^^ '^^ -its in the ith bin is 

'i = li + (i-l) «l (1) 
50 and the total number of unit time interval in the first n windows is given by: 
n 
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55 



.Ij = nlj + %n (n -1) 6l 

(2) 



based on the well known relationship 
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n 

J(i-l) = %n(n-l) 



10 



[Example: 0 + 1+ 2 + 3^6 = UAMm 
It follows that: 



(3) 
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n-1 



(4) 

25 Substituting equation (2) into (4) and simplifying gives: 
Tn = 1^ [(n2 -2n + 1) «| + (2n -1)1,] (5) 

^ It can be seen that when n is large. T„ can be expressed approximately as: 
T„ = An^ + B (6) 

» and <»„ be expressed ap(Z™Sy »f '" " " «l »» bin number 

T„ a n2 (7) 
or n a V (8) 
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^ de^rrroTu^rrh^ raxirnrb:;'"rri:ri'";' ^^-^^ ^^-^^ 

.oca^ons available. Refern-g to equation A ^ ^Cd^^^^ 

the arrs=™-~p:i r n -"^^^^^^^^^^ with 

4.000. This means a substanti^ SSt 11 ' ' ^ °' '"""^'^ P°'"*^ ^« 120 instead of 

larger overall migration tir^^^rvr e Jl^^^^^^ ""^^^ « 

binning method'of the tt t"ntiraTari^^^ ^ *° 

electropherogramismore clearlydS^n i TL'"^^ "^'3- Peaks in the 

the result of the electrophoresis sepSn "^""^ '""^^ interpretation of 

signS-t?n;i:i:a?br^^^^^^^^^^ "^'"^ ^ -P-enta«o„. better resolution and 

d JCiS J^cl^r^orinr^r t^ ^'"^^ °" electropherogram. the rate of 

increases the number of ir^^s^oir °* ^"'^ "''^^ '"^'^'"^'y 

signal-to-noise for low data valuXr^^'a^'ob,^^^^^^^ ^ Po<>' 

ti.e according to the present inve^ l^^r ^t^'^r^^^ I" - ^ 
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noise improved without sacrificing memory storage locations In fart th^ n™«.„. • 

r:^:^:^ rcrf::e~^^^^^^ 

perform noise filtering byTrirg des "bL r^^^^^^^ ''"^ ''"^ subsequen«y 

' furtht^dran^yrr 'er^^^^^^^^ ^'""^ ^ -stormed for 

equal band widtf,'and s^c^^7^Zi ^uZTT^ f 'h''1"'? '^'^'^ °' approximately 
the raw data have been found to h^^creaLe slht^v^r ^" ' ^""^ """^ '^"'='"9 °^ P«^^ 
data analysis by Fourier fr^sJZSr Ha^s^^^^^^ % ""^'^ ™" ^'"""^ subsequent 

» tropherogram plotted ZnT^e Wn nlS' I ! ^1°""^' fr^nsformed representation of an elec- 
and wil, ^ot be di^cusT^^Zr ^ZIZ"^' '-^-'^--^ are well known in the art 

to Ihe p,es««e ieTs dSTL living an,pl*«tes above Ite »™ axis am corresponding 

sively increasing lengths S:h fl^^^^^^ 'T^ ^"""'^'"^ ^^^'^^"'^ °^ P^09^^- 

by the detec^on of the Srerforr^no 'f ^ ^^^men^s are detecL 

may be different for differerSgqiTalent T t « *^ ^^"'^ '««P°"^ 
example in Figs. 1. 4 and 5 were SSned fZ S J? *^""™*"9 ""^'^W^e can be detem,ined. (The 
the sequence Of tem,iSBng nl^^^ZX ZI^^TT'" °' > ^ ''^^ntify 

of the DNA sample. ^ nucleotides for progressively longer fragments, thereby allowing identificaHon 

frag;:rst?ir^^^^^^^ in electrophoresis of DNA 

be used to filter the raw d^H-ltrSo^^^^^^ T ' T ' P^"^"* 

made without departing from l^rrtoTlf * modrfications and improvements may be 

:3.on:s„o.„..L.b,Tsrr,:is=eS2;.':^r.r^ 



Claims 



pooling the data into the plurality of bins; 
and 

pooling the data into the plurality of bins- and 
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9. 



10. 



L A method of analyzing data obtained by electrophoresis of sequenced DNA fragments, CHARAC- 
TERISED IN THAT the method comprises the steps of: 

choosing a plurality of bins each of a size corresponding to a range of migration time, wherein the 
sizes of the bins increases progressively at higher migration time; 

pooling the data into the plurality of bins; 

constructing an electropherogram. whereby the signal to noise and resolution have been improved 
by the pooling of data into the bins; and 

identifying peaks in the electropherogram to detemnine the sequence of the DNA fragments. 

A method of identifying sequenced DNA fragments. CHARACTERISED IN THAT the method com- 
prises the steps of: 

performing electrophoresis on the sequenced DNA fragments to separate the fragments; 
migrating the fragments to move past a detector which detects the presence of the fragments- 
providing an output of the detector in the form of data with respect to migration time- 
choosing a plurality of bins each of a size corresponding to a range of migration time, wherein the 
sizes of the bins increases progressively at higher migration time; 
pooling the data into the plurality of bins; 

constructing an electropherogram. whereby the signal to noise and resolution have been improved 
by the pooling of data into the bins; and 

identifying peaks in the electropherogram to detemiine the sequence of the fragments. 

A method as claimed in any of claims 1 - 4. CHARACTERISED IN THAT the bins each is made up of 
time intervals of equal sizes, and the size of each bin is chosen such that the number of time intervals 
in the consecutive bins follows an arithmetic series. 

A method as in claim 5. CHARACTERISED IN THAT the average migration time of the respective bin 
IS approximately proportional to the square of the respective number of the respective bin. 

A method as claimed in any of claims 1 - 4. CHARACTERISED IN THAT the method further comprises 
the step of filtenng the binned data by Fourier transfomiation. 

A system for electrophoresis, CHARACTERISED IN THAT the system comprises: 
means for performing electrophoresis on a sample to separate it into its species; 
a detector for detecting the presence of the species and for producing corresponding data with 

respect to migration time; a 

means for migrating the species past the detector so as to detect the presence of the species in 
sequence; 

data storage means for choosing a plurality of bins in the data storage, each of a size correspond- 
ing to a range of migration time, wherein the sizes of the bins increases progressively at higher 
migration time; 

means for pooling the data into the plurality of bins; and 

means for constructing an electropherogram, whereby the signal to noise and resolution have been 
improved by the pooling of data into the bins. 

An apparatus as claimed in claim 8. CHARACTERISED IN THAT the bins each is made up of time 
intervals of equal sizes, and the size of each bin is chosen such that the number of time intervals in the 
consecutive bins follows an arithmetic series. 

An apparatus as claimed in claim 9 CHARACTERISED IN THAT the average migration time of the 
respective bin is approximately proportional to the square of the respective number of the respective 



11. An apparatus as claimed in claim 9. CHARACTERISED IN THAT the apparatus further comprises 
means for filtering binned data by Fourier transformation. 
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